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attempts  tc  show  that  engine  failures  are  dependent  upon 
some  combination  of  historical  flying  hour  and  sortie  data.  A  method  of 
arrangement  was  undertaken  which  transformed  the  collected  data  into  specific 
historical  groupings.  These  specific  groups  of  data  were  then  statistically 
analysed  and  a  forecasting  model  developed.  The  statistical  analysis  was 
performed  by  application  of  multiple  correlation  and  regression  techniques 
to  the  data.  The  Biomedical  Series, BMD02R,  Stepwise  Multiple  Regression 
package  was  chosen  for  use.  The  predictive  power  of  the  model  was  evaluated, 
the  statistical  assumptions  tested,  research  conclusions  drawn,  and 
recommendations  made  for  further  studies. 
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Chapter  1 


INTRODUCTION 


BasisgJfiuM 

The  United  States  Air  Force  has  over  six  billion 
dollars  invested  in  jet  aircraft  engines.  (22)*  To 
maintain  these  engines  additional  investments  are  made  in 
highly  skilled  labor  and  complex  equipment.  The  engines 
are  easily  damaged  and  time  consuming  to  repair.  Although 
any  one  of  these  investments  could  be  studied  for  efficiency 
and  cost  control,  it  seems  only  logical  that  the  primary  way 
to  minimize  aircraft  engine  costs  is  to  have  only  the 
required  number  on  hand.  Basic  to  the  establishment  and 
effective  management  of  this  "required M  inventory  is  an 
accurate  technique  of  predicting  future  requirements. 
Accuracy  in  this  forecasting  technique  will  not  only  yield 
a  monetary  savings,  but  will  also  enhance  the  operational 
capability  and  effectiveness  of  the  Air  Force  weapon  systems 

The  current  Air  Force  methodology  for  computing  jet 
engine  spare  requirements  is  based  upon  the  actuarial  fore¬ 
casting  concept.  This  program  consists  of, 

*The  first  number  refers  to  the  Bibliography  refer¬ 
ence  number,  the  second  refers  to  page  number(s)i  e.g., 
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•  .  •  the  development  and  use  of  actuarial, 
mathematics  ar.d  the  theory  of  probability  for 
determination  of  failure  rates  and.„the  life  ex¬ 
pectancy  /for  jet  aircraft  enpir.es/.  (5:1-2) 

The  application  of  these  actuarial  principles  to  jet 
engine  demand  forecasting  is  based  upon  the  assumption  that 
failures  of  engines  are  a  function  of  age.  This  age  is 
measured  in  terms  of  aircraft  flying  hours.  AFM  400-1, 

Volume  III,  and  T.O.  00-25-128  explain  in  detail  the  Air 
Force  actuarial  forecasting  system.  Air  Force  Logistics 
Command  (AFLC)  managers  have  frequently  questioned  the 
validity  of  this  forecasting  method.  Actual  failures  of 
jet  engines  have  varied  widely  from  AFLC  predictions.  Con¬ 
sequently,  current  management  feels  that  in  using  accumulated 
flying  hours  as  the  sole  demand  prediction  tool  of  engine, 
failures,  they  may  be  neglecting  other  critical  factors. 

(22)  Another  subject  of  major  concern  in  the  utilization  of 
the  actuarial  forecasting  technique  is  that  it  draws  upon 
past  data  and  does  not  provide  for  the  input  of  variables 
based  upon  expected  future  states  of  nature  such  as  sorties. 
(20:8) 

Considerable  effort  is  being  expended  by  many 
government  researchers  in  these  two  areas.  This  thesis 
limits  its  study  to  the  consideration  of  flying  hours  and 
sorties  as  feasible  predictors  of  engine  failures.  His¬ 
torical  records  and  estimates  of  future  flying  programs 
will  be  the  basic  sources  of  data. 
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The  Probl  em 

The  number  of  spare  aircraft  engines  required  for  a 
specific  model  of  aircraft  in  the  Air  Force  inventory  is 
currently  determined  as  a  function  of  flying  hours.  Air 
Force  Logistics  Command  managers  suspect  that  the  sole  use 
of  flying  hours  as  a  demand  prediction  tool  does  not  yield 
an  accurate  picture  of  engine  demands.  A  major  portion  of 
that  demand  is  premature  engine  failures  which  are  also 
currently  predicted  on  the  basis  of  flying  hours.  If  an 
accurate  forecast  of  premature  engine  failures  can  be  made, 
the  inventory  manager  can  then  make  a  much  improved  esti¬ 
mate  of  the  resources  he  will  have  to  expend  in  support  of 
aircraft  propulsion  units.  The  specific  problem,  there¬ 
fore,  may  be  phrased  as  the  question*  Are  there  other  air¬ 
craft  program  activities  which  can  be  used  as  demand  predic 
tion  tools  to  provide  a  more  accurate  estimate  of  engine 
failures? 

Assumptions  and  Limitations 

The  authors  believe  that  certain  basic  assumptions 
must  be  made  before  a  program  activity  can  be  used  as  a 
demand  prediction  tool.  These  assumptions  are  as  follows* 

1 .  The  future  of  a  program  activity  (such  as  fly¬ 
ing  hours  or  sorties)  can  be  accurately  forecast. 

2.  A  reliable,  measurable  relationship  exists 
between  the  demand  element  and  the  program  activity. 

3.  The  data  generated  by  the  program  activities 


are  accurate. 
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There  were  three  major  limitations  placed  upon  this 
study.  They  were i 

1. .  Only  one  aircraft -engine  combination  was  to  be 
used  for  study. 

2.  A  minimum  of  three  years’  data  was  to  be 
collected  for  analysis. 

3.  The  selection  of  program  activities  was  limited 
to  those  activities  currently  being  measured  by  quantita¬ 
tive  techniques  prescribed  in  Air  Force  directives. 

The  objectives  of  this  thesis  were  threefold i 

1.  Identify  program  activities  that  may  be  suit¬ 
able  for  use  as  engine  failure  prediction  tools. 

2.  Develop  a  failure  predicting  model  with  regres¬ 
sion  analysis  techniques. 

3.  Statistically  test  and  evaluate  the  developed 

model . 

Hyrnttela 

The  research  methodology  utilized  in  this  thesis 
was  designed  to  test  the  following  hypothesis i  A  combina¬ 
tion  of  flying  hours  and  sorties  can  be  utilized  to  yield 
accurate  jet  aircraft  engine  failure  forecasts. 

Overview 

This  thesis  attempts  to  show  that  engine  failures 
are  dependent  upon  some  combination  of  historical  flying 


hour  and  sortie  data.  A  method  of  arrangement  was  under-  • 
taken  which  transformed  the  collected  data  into  specific 
historical  groupings.  These  specific  groups  of  data  were 
then  statistically  analyzed  and  a  forecasting  model 
developed.  The  statistical  analysis  was  performed  by  appli 
cation  of. multiple  correlation  and  regression  techniques  to 
the  data.  The  Biomedical  Series,  BND02R,  Stepwise  Multiple 
Regression  package  (10)  was  chosen  for  use  because  of  its 
completeness  in  data  output  and  apparent  versatility  in  use 
Chapter  2  sets  forth  how  the  data  was  collected, 
screened  and  prepared  for  use  in  the  development  of  a  set 
of  predictive  models.  The  third  chapter  delineates  in 
detail  the  model  development,  the  test  of  statistical 
assumptions,  and  the  evaluation  of  predictive  power. 

Chapter  4  contains  the  interpretation  of  model  behavior, 
conclusions  concerning  the  effectiveness  of  the  model,  and 
recommendations  for  further  study. 
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Chapter  2 


DATA  COLLECTION  AND  ARRANGEMENT 


This  chapter  will  describe  how  the  data  was  col¬ 
lected,  screened,  and  prepared  for  use  in  the  development 
of  a  set  of  predictive  models.  It  is  broken  into  six 
topics i  (1)  Selection  of  Program  Activities,  (2)  Selection 
of  Aircraft-Engine  Combination,  (3)  Data  Collection,  (4) 
Data  Screening,  (5)  Preliminary  Data  Study,  and  (6)  Data 
Arrangement.  Model  development,  verification,  and  valida¬ 
tion  are  discussed  in  the  succeeding  chapter. 


Selection  of  Program 
Activities 

Flying  hours  and  sorties  were  the  program  activities 
chosen  to  be  included  in  this  study  of  jet  aircraft  engine 
failures.  Design,  operational  and  logistical  personnel  have 
long  considered  accumulated  flying  hours  as  a  measure  of 

• 

aircraft  engine  life.  Although  useful,  this  measure  unfor¬ 
tunately  does  not  produce  the  accuracy  in  predicting  engine 
failures  desired  by  AFLC. 

Engine  design  personnel  seem  to  have  concluded  that 
frequent  short  durations  of  extreme  temperatures  have  a 
detrimental  effect  on  the  life  of  a  jet  engine.  This  condi¬ 
tion  is  normally  generated  by  demanding  maximum  thrust 
from  the  engine.  The  running  of  an  online  into  and  out  of 


this  critical  temperature  range  is  defined  as  a  cycle.  At 
present,  engine  cycles  are  not  recorded  in  any  manner. 
However,  sorties  are  recorded  and  if  it  can  be  assumed  that 
a  sortie  contains  at  least  one  cycle,  generated  at  take-off, 
then  there  exists  an  imperfect,  but  possibly  useful,  measure 
of  cycles  that  could  be  applied  to  the  explanation  of  engine 
failures.  This  reasoning  coupled  with  the  idea  that  simple 
frequency  of  use  may  shorten  engine  life  seemed  to  provide  a 
sufficient  argument  for  the  consideration  of  sorties  as  an 
element  of  engine  failures.  Finally,  both  of  these  activi¬ 
ties  are  readily  understood,  easily  measured,  and  currently 
recorded . 

Flvina  hours.  Flying  hours  are  defined  as  "all  time 
of  flight  of  a  military  aircraft  creditable  to  the  aircraft, 
its  equipment  and  personnel  aboard."  (4tl00)  Currently  all 
engine  forecasts  are  based  on  the  number  of  hours  an  engine 
type  has  flown.  This  historical  approach  has  largely 
neglected  to  consider  the  frequency  of  use  and  the  non-flying 
operating  time  accumulated  on  an  engine.  For  example,  eir-  * 
craft  sorties,  taxi  time,  and  maintenance  test  runs  are 
recorded,  but  are  not  considered  as  a  part  of  the  USAF  actu¬ 
arial  forecasting  system.  Flying  hours  should  be  considered 
as  only  a  part  of  the  engine  failure  problem,  not  all  of  it. 

Sorties.  A  sortie  is  defined  as, 

A  flight,  by  the  same  aircraft,  intended  to 
accomplish  a  specific  assigned  task.  A  sortie  is 
normally  terminated  by  engine  shutdown.  A  flight 
beginning  and  ending  at  the  same  airdrome,  is 


considered  one  sortie,  even  though  touch-and-go 
landings  may  be  accomplished  at  other  airdromes. 

(4 1 2 09). 

Engine  managers,  engineers,  and  using  Air  Force  commands 
have  often  expressed  the  opinion  that  sorties  may  be  a  major 
factor  in  the  life  of  an  engine.  (22)  A  preliminary  study 
was  performed  by  the  operations  research  personnel  at  AFLC 
Headquarters  in  1970  under  the  assumption  that  sorties 
could  be  related  to  engine  failures.  The  early  results  of 
this^atudy  - showed  promise  for  using  a  combination  of  sorties 
and  flying  hours  as  an  engine  failure  prediction  tool.  (23) 
A  definitive  RAND  Corporation  study  (RM-6010-PR, 

June  1969)  reported  the  results  of  simulation  exercises  in 
this  general  area.  They  also  indicated  that  a  combination 
of  sorties  and  flying  hours  should  result  in  a  superior 
prediction  tool. 


Several  aircraft-engine  combinations  were  considered 
for  use  in  this  study.  Each  was  evaluated  against  the  follow¬ 
ing  criteria* 

1 .  Were  there  enough  observations  available  so  that 
a  significant  statistical  analysis  could  be  applied? 

2.  Did  the  aircraft  represent  a  major  weapon  system 
of  the  Air  Force  inventory? 

3.  Was  the  aircraft  considered  ns  having  a  future  in 
the  Air  Force  inventory? 
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4.  Did  the  aircraft  represent  a  major  investment 
in  Department  of  Defense  resources? 

5.  V7ere  there  at  least  three  years'  of  unclassified 
program  activity  data  available? 

The  final  selection  consisted  of  the  B-52H  strategic 
bomber  and  its  TF33-3  engines.  The  Air  Force  possesses  99 
of  these  aircraft.  There  are  792  installed  engines  and  96 
spares  in  the  supply  system.  (22)  The  selection  of  this 
aircraft -engine  combination  was  thoroughly  discussed  with 
engine  management  personnel  at  AFLC  Headquarters.  They 
expressed  the  opinion  that  it  was  highly  suited  to  the  type 
of  study  being  performed. 

Patel  Collect ion 

The  following  data  was  collected  on  the  entire  B-52H 

fleet i 

1 .  Total  number  of  flying  hours  accumulated  per 

month. 

2.  Total  number  of  sorties  accumulated  per  month. 

4 

3.  Total  number  of  engine  failures  requiring  depot 
or  intermediate  level  maintenance  accumulated  per  month. 

All  this  data  was  available  within  the  Directorate 
of  Propulsion  and  Auxiliary  Power  Systems  Office,  Head¬ 
quarters  AFLC. 

To  simplify  the  explanation  of  the  data  collection 
technique,  Table  1  was  constructed. 


Table  1 

DATA  COLLECTION  TECHNIQUE 
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DATA 

SOURCE  OFFICE 

SOURCE  DOCUMENT 

EXTRACTION 

METHOD 

Flying 

Hours 

Accumulated 

AFLC/MMAPP 

Monthly  Aero¬ 
space  Vehicle 
Status  Report 
G033BNF 

Manual 

Sorties 

Accumulated 

AFLC/MMAPP 

Monthly  Aero¬ 
space  Vehicle 
Status  Report 
G033BNF 

Manual  % 

Actual 

Recorded 

Failures 

AFLC/MMPP 

Aircraft  Engine 
Removal  and  Loss 
Report 

D024FI03-N1 

Manual 

Cost  and  time  constraints  precluded  a  mechanized  data  gather¬ 
ing  technique. 

The  time  frame  over  which  the  data  was  collected  was 
1  October  1965  through  30  September  1 971 9  a  period  consist¬ 
ing  of  72  months.  Flying  hour  and  sortie  data  were  available 
for  the  entire  72-month  period.  However,  engine  failure  data 
was  available  only  from  1  July  1968  through  30  September 
1971,  a  period  of  39  months.  Figure  1  graphically  presents 
the  time  periods  covered  by  the  data. 


After  the  basic  data  of  engine  failures,  flying  hours, 
and  sorties  for  the  B-52H  aircraft  fleet  were  collected,  a 


Time  Frame  of  Available  Data 


process  of  screening  and  organizing  was  employed  to  construct 
the  Master  Data  Sheet,  Table  2. 

The  data  extracted  as  engine  failures  was  examined 
for  (1)  invalid  reason  for  engine  removal  codes,  (2)  engine 
removal  codes  that  related  to  other  than  depot  or  field 
level  failures,  and  (3)  general  errors  in  keypunch  entry. 

Each  engine  removal  was  a  separate  entry  in  the  DO24FI03-N1 
report?  therefore,  it  was  necessary  to  group  the  failures 
by  calendar  month  periods  for  inclusion  on  the  Master  Data 
Sheet.  The  examination  of  the  removal  codes  and  the  group¬ 
ing  of  the  data  was  mechanized  on  the  G.E.  115  Batch  Remote 
Computer. 

The  flying  hour  and  sortie  data  was  examined  for  . • 
(1)  missing  monthly  entries  and  (2)  general  errors  in  key¬ 
punch.  The  data  was  extracted  from  the  G033BNF  report, 
then  manually  examined,  and  entered  directly  onto  the  Master 
Data  Sheet. 

Preliminary  Data,. Study 

As  the  first  step  in  the  preliminary  study,  a  set  of 
histograms  and  time  series  graphs  were  prepared  for  each  of 
the  three  basic  variables  measured.  Figures  2a,  2b,  3a,  3b, 
4a,  and  4b  depict  this  part  of  the  study.  Next,  a  set  of 
curves  were  fit  to  specific  groupings  of  the  data  using  the 
subroutine  TAJRFIT"  available  on  the  G.E.  615  series  computer 
time  sharing  system  and  adapted  to  cathode  ray  tube  (CRT) 
display.  This  pro;  ram  fits  six  different  least  squares 

(Text  continues  on  pfcpe  21) 
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Table  2  (continued) 


* 


Observed 


Observe  <, 
Number 

Date 

Observed 

Failures 

Flying 

Hours 

Observed 

Sorties 

44 

May  69 

26 

3120 

383 

45 

Jun 

10 

2559 

333 

46 

Jul 

25 

2727 

322 

g 

47 

Aug 

35 

3230 

362 

M 

48 

Sep 

25 

2642 

331 

< 

03 

03 

49 

Oct 

20 

1401 

198 

H 

04 

50 

Nov 

23 

2604 

320 

<4 

Q 

w 

51 

Dec 

28 

2604 

330 

w 

$ 

CO 

< 

52 

Jan  70 

12 

2598 

313 

H 

< 

« 

53 

Feb 

23 

2877 

318 

03 

Q 

i-3 

54 

Mar 

23 

2381 

302 

O 

co 

W 

03 

U3 

Cs 

55 

Apr 

27 

2266 

306 

D 

o 

56 

May 

36 

2854 

327 

§ 

M 

M 

<£+ 

57 

Jun 

29 

2137 

433 

£ 

58 

Jul 

33 

2817 

377 

03 

r*i 

59 

Aug 

33 

2699 

332 

O 

UJ 

2 

60 

Sep 

21 

2704 

330 

03 

H 

O 

61 

Oct 

21 

2993 

367 

9 

§ 

62 

Nov 

11 

2588 

333 

►4 

M 

(—4 

o 

63 

Dec 

20 

2486 

327 

2 

H 

P4 

64 

Jan  71 

18 

2096 

305 

E 

a 

Hi 

65 

Feb 

21 

3034 

335 

66 

Mar 

33 

3167 

392 

2 

O 

67 

Apr 

29 

3144 

421  • 

H 

Li 

68 

May 

18’ 

2870 

383 

o 

69 

Jun 

34 

2921 

420 

H 

70 

Jul 

24 

3053 

427 

Q 

71 

Aug 

25 

3218 

430 

■2i 

_ ZL _ 

Sen 

_ 15- . 

HH 

403 

NOTE i  Failure  Data  Not  Available  Before  July  1968 


NUMBERS  OF 


FLYING  HOURS  IN  THOUSANDS 


B-52H  Monthly  Flying  Hour  History 


NUMBERS  OF 


Histogram  of  Monthly  Flying  Hour  Data 


FAILURES 


NOTE*  Failure  Data  Not  Available  Before  July  1968 


NUMBERS  OF 


NOTE*  Data  available  July  1968  -  September  1971  only 
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curves  to  the  supplied  data.  Six  coefficients  of  determina- 
tion  (R  )  are  then  presented  so  that  the  operator  may  select 
the  curve  he  wishes  to  plot;  The  data  groupings  were  (1) 
failures  versus  sorties,  (2)  failures  versus  flying  hours, 
and  (3)  sorties  versus  flying  hours.  In  the  first  and 
second  groupings  a  linear  function  was  determined  to  be  the 
best  curve  fit*  but,  in  both  cases,  it  was  far  from  what 
might  be  called  a  "good  fit."  Reference  Figures  5  and  6. 
These  observations  gave  rise  to  a  suspicion  that  there  may 
develop  a  relationship  other  than  linear  when  different 
variables  were  combined  for  the  multiple  regression  analysis 
approach.  However,  none  developed  or  was  found  to  exist. 

The  fit  of  a  hyperbolic  function  to  the  third  grouping, 
sorties  versus  flying  hours,  Figure  7,  helped  substantiate 
the  authors*  belief  thac  while  sorties  and  flying  hours  are 
related,  the  relationship  tended  to  be  curvilinear  as 
opposed  to  linear  and  is  not  one  of  extremely  high  correla¬ 
tion.  Thus,  sorties  are  probably  not  totally  dependent  nor 
independent  of  flying  hours,  and  may  be  an  additional  factor 
worthy  of  consideration  when  attempting  to  explain  engine 
failures . 

Bata.  Ante  nsgasas 

The  data  arrangement  process  was  broken  down,  into 
three  steps*  (1)  determination  of  prediction  period,  (2) 
historical  grouping  of  the  data,  and  (3)  adaption  to  the 
Biomedical  Program’s  standard  matrix  form. 


ENGINE 
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Prediction  period  construct ion*  Prior  to  the 
historical  grouping  of  the  data,  it  was  necessary  to  deter¬ 
mine  what  the  prediction  period  was  to  be  so  that  the  data 
could  be  structured  as  portrayed  in  Figure  8.  Using  this 
prediction  period,  two  techniques  were  designed  to  validate 
the  forecasting  ability  of  each  of  the  multiple  regression 
developed  models. 

1.  Technique  1  was  the  application  of  the  model 
derived  from  the  base  period  1  July  1968  through  30  September 
1970  to  each  of  the  eleven  months  in  the  prediction  period, 

1  November  1970  through  3.0  September  1971.  The  same  linear 
model  was  employed  for  all  eleven  predictions.  Reference 
Figure  8. 

2.  Technique  2  applied  the  linear  model  developed 
from  the  base  period  1  July  1968  through  30  September  1970 
to  the  prediction  of  November  1970  failures.  Then  the 
linear  equation  was  changed  by  adding  October  1970's  observed 
failures,  flying  hours  and  sorties  to  the  data  base,  sub¬ 
tracting  the  oldest  set  of  observations,  July  1968,  and 
recomputing  the  regression.  In  this  manner,  the  most 
current  date  was  utilized  and  the  data  base  maintained  at  27 
months.  There  were  eleven  different  equations  developed, 
one  for  each  month  in  the  prediction  period. 

In  general,  for  both  techniques  all  forecasts  were 
computed  on  the  first  day  of  each  month  for  all  engine 
failures  that  would  occur  durina  the  following  month.  For 
example,  on  l  October  1^70,  n  prediction  of  oiv* ir.e  fail  ires 
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for  Che  entire  month  of  November  was  made.  This  approach 
made  necessary  the  estimation  of  flying  hour  and  sortie 
totals  of  the  months  of  October  and  November. 

Historical  prouoin**.  It  was  highly  conceivable 
that  engine  failures  were  the  result  of  some  historical 
combination  of  flying  hours  and  sorties.  The  question, 
though,  was  how  were  failures,  flying  hours,  and  sorties 
related*  and  if  a  useful  relationship  existed,  was  it  time- 
oriented  in  the  historical  context?  The  stepwise  multiple 
regression  package  selected  was  designed  to  relate  inde¬ 
pendent  and  dependent  variables  and  posed  no  particular 
problem.  However,  the  historical  orientation  was  more 
difficult  to  handle.  The  idea  presented  here  was  to 
approach  the  problem  by  "creating"  variables  of  historically 
grouped  flying  hour  and  sortie  data.  Perhaps  the  easiest 
way  to  describe  this  grouping  process  is  to  define  the 
variables  used.  Specifically  let, 

Xn  =  the  observed  engine  failures  for  the  ith 
month,  (i  =  34,  35,  36,  .  •  .,  72) 

*  the  observed  sorties  flown  for  the  ith 
1  month.  (i  =  1,  2,  3,  .  .  .,  72) 

X.o  «  the  observed  flying  hours  flown  for  the  ith 
month,  (i  =  1,  2,  3,  .  .  .,72) 

These  three  variables  are  the  original  data  collected  and 

tabr  ted  in  Table  2. 
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In  general,  let, 

X.  .  =  the  jth  independent  variable  (accumulated 
sorties)  associated  with  the  ith  month. 


(j  —  2,  4,  6,  •  •  i,  66) 

(i  =  34,  35,  36,  .  .  .,  72) 


X.  =  the  rth  independent  variable  (accumulated 
ir  flying  hours)  associated  with  the  ith  month. 


(r  -  3,  5,  7,  .  .  67) 

(i  =  34,  35,  36,  .  .  .,  72) 


These  are  the  cheated  variables  and  were  determined  by  the 
use  of  the  following  equations* 


i-m+t 


xu '  v 


Sortie  Accumulation 
where  j  =  2m 


i-m+1 


Flying  Hour  Accumulation 
where  r  -  2m  +  1 


where, 


the  number  of  months  of  historical  data  to  be 
summed,  (m  <  33) 


In  this  manner,  64  independent  variables  were 
’'created1*  to  show  historical  accumulation  of  flying  hour  and 
sortie  data.  Thirty-two  of  those  variables,  ranging  from 
two  months  to  33  months  accumulated  data,  pertain  to  sortie 
history.  The  remaining  32  variables  are  similar  accumula¬ 
tions  of  flying  hour  history.  After  the  created  variables 
are  combined  with  the  two  observed  variables,  there  are  66 
independent  variables  available  for  regression  against  i.  .e 
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dependent  variable,  engine  failures.  Table  3  is  the  key  to 
variable  identification. 


Table  3 

VARIABLE  IDENTIFICATION  KEY 


If  variable 

ist 

Then  it  is  defined  as» 

xu 

Observed  engine  failures 
for  the  ith  month 

X .  . ,  where 
J'.-.  2, 

4f  6f  i  •  •  i  66 

j/2  months  historical 
accumulation  of  total  fleet 
sorties 

X.  ,  where 
lr  r  -  3, 

5,  7,  .  .  .,  67 

(r-l)/2  months  historical 
accumulation  of  total  fleet 
sorties 

NOTEi  i  equates  to  an  observation  as  presented  in  Table  2. 
In  this  table,  i  »  34,  35,  36,  .  .  .,  72. 


Biomedical  Standard  Matrix.  In  general,  the  Biomed¬ 
ical  programs  require  that  all  data  be  prepared  in  a  two- 
dimensional  matrix  that  is  arranged  by  case,  by  variable. 
(10*11 )  A  case  equates  to  a  month’s  observations  in  this 
instance*  This  specific  matrix  was  built  with  punch  cards 
and  stored  in  a  permanent  file  on  the  G.E.  615  computer. 

The  next  major  step  was  the  development  of  the  prediction 
models  using  the  Biomedical  Series  Program  BMD02R,  Stepwise 
Multiple  Regression  package. 


Chapter  3 


MODEL  DEVELOPMENT 

This  chapter  delineates  in  detail  how  the  model  was 
developed,  the  statistical  assumptions  verified,  and  the 
predictive  power  of  each  model  validated.  It  is  divided 
into  four  topics t  (1)  Computer  Program  Description, 

(2)  Building  the  Model,  (3)  Verification  of  the  Statistical 
Tool,  and  (4)  Validation  of  the  Model.  The  analysis  of  the 
obtained  results  and  the  conclusions  drawn  from  the  analysis 
are  discussed  in  the  succeeding  chapter. 

Program  Description 

The  BMD02R  stepwise  multiple  regression  program  com¬ 
putes  a  series  of  linear  regression  equations  in  a  prescribed 
sequential  manner.  Cl 0 » 233)  The  first  step  selects  the 
independent  variable  that  explains  the  greatest  part  of  the 
variation  in  the  dependent  variable  and  calculates  the 

* 

simple  regression  relationship  which  exists  between  them. 

The  second  independent  variable  is  selected  on  the  basis  of 
making  the  greatest  additional  contribution  to  the  explained 
variation.  Continuing  in  this  manner,  the  program  carries 
out  the  regression  for  each  independent  variable  that  can 
significantly  contribute  to  the  reduction  of  unexplained 
variation  in  the  dependent  variable.  At  each  step,  the 


30 


31 

coefficient  of  mtilciple  correlation  (R),  the  standard  error 
„of  the  estimate,  and  an  analysis  of  variance  on  the  regres¬ 
sion  and  residual  values  are  presented.  Also  included  at 
each  step  is  a  listing  of  the  constant  tern  and  all  variables 
entered  into  the  equation  along  with  their  respective  net 
regression  coefficients  and  standard  errors.  The  final  por¬ 
tion  of  the  output  at  each  step  is  a  listing  of  all  of  the 
independent  variables  not  included  in  the  equation  and  their 
respective  partial  correlation  coefficients.  These  coef¬ 
ficients  give  an  indication  of  the  relative  importance  of 
each  of  the  variables  not  yet  entered  into  the  regression 
equation.  After  the  last  step,  a  listing  of  the  residuals 
is  prepared.  These  residuals  are  the  variation  in  the 
dependent  variable  not  explained  by  the  multiple  regression 
equation  of  the  last  step. 

Optional  output  features  available  are*  (1)  a  mean 
and  standard  deviation  table  for  all  variables,  (2)  a  co- 
variance  matrix,  (3)  a  correlation  matrix,  (4)  a  summary  ' 
table,  and  (5)  graphic  plots  of  the  residuals  against 
selected  independent  variables  that  appear  in  the  final 
regression  equation.  (10*233) 
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Building,  thfi.  Mole; 

In  Chapter  2  the  initial  model  base  period  was  estab¬ 
lished  to  be  from  July  1968  through  September  1970,  a  period 
of  27  months.*  The  first  stepwise  regression  was  run  on 
this  selected  data  with  no  specified  limitation  set  on  the 
number  of  steps  that  the  program  would  execute.  A  total  of 
25  steps  was  taken  by  the  program  before  it  reached  the 
internally  specified  F  test  alpha  (significance)  levels  of 
.01  for  variable  inclusion  and  .005  for  variable  deletion. 
Careful  observation  of  all  of  the  data  presented  indicated 
that  the  standard  error  of  the  estimate,  or  sample  standard 
deviation  of  the  regression,  reached  a  minimum  point  at  the 
20th  step.  This  occurrence  is  portrayed  in  Figure  9.  With 
the  standard  error  of  estimate  at  a  minimum  value  of  0.5650 

engine  failures,,  and  the  coefficient  of  multiple  determina- 
■  o 

tion  (R  )  equal  to  0.9990  the  regression  equation  at  the 
20th  step  was  selected  for  use  in  the  Technique  1  predic¬ 
tions.  The  specific  equation  is  listed  in  Figure  10  because 
of  its  length.  A  regression  equation  that  contains  20  vari¬ 
able  coefficients  is  somewhat  dubious j  therefore,  the  popula¬ 
tion  net  regression  coefficients  were  tested  for  significance 
in  two  separate  manners.  (24i788-793) 


Technique  1  and  Technique  2  were  virtually  identical 
in  all  aspects  of  development  and  verification!  therefore,  to 
maintain  simplicity,  only  Technique  1  is  discussed  in  this 
chapter.  The  one  exception  is  that  during  the  validation 
process  the  prediction  ability  of  Technique  2  is  included 
in  the  discussion. 
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The  BMD  package  program .>  at  each  step,  tests  the 
hypothesis  that  the  population  net  regression  coefficients 
(B^)  are  zero,  in  such  a  mariner  that  it  is  a  test  of  the 
overall  significance  of  the  regression  line.  The  hypothesis 
being  tested  was, 

Hq  i  Bj  =  B2  =  •  •  •  B^  =  0 
Hj  1  B^  B2  »  »  *  B^  /  0 

An  F  statistic  was  calculated  by  performing  an  analysis  of 
variance  of  the  regression  and  residual  values.  The  degrees 
of  freedom  present  for  the  regression  values,  at  the  20th 
step,  were  p  -  1  =20  where  p  is  equal  to  the  number  of 
independent  variables  in  the  regression  equation  plus  one 
for  the  constant  term.  The  degrees  of  freedom  present  for 
the  residuals,  at  the  same  step,  were  n  -  p  =  6  where  n  was 
equal  to  the  total  number  of  observations  (or  cases)  consid¬ 
ered.  The  F  critical  value  with  20  and  6  degrees  of  free¬ 
dom  at  the  .05  alpha  level  equaled  3.86.  At  step  20  the  F 
l  statistic  was  calculated  to  be  313.882. 

It  is  obvious  that  the  F  statistic  was  significant 
and  the  null  hypothesis  rejected.  Technically,  it  could 
further  be  said  that  there  was  regression  in  the  population 
and  the  improvement  brought  by  fitting  this  regression  plane 
was  not  due  to  chance.  It  should  be  noted,  though,  that 
each  step  (20  in  all)  was  successful  in  rejecting  the  stated 
null  hypothesis. 

At  this  point,  a  different  tack  was  taken  and  the 
population  net  regression  coef Cicieits  were  tested  sep.’ y 
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for  significance  with  a  t  test.  The  hypothesis  under  test 
vast 

V*  Bi  =  0 

Hj  i  +  0 

A  t  statistic  was  calculated  for  each  coefficient  by  divid¬ 
ing  that  net  regression  coefficient  by  its  standard  error. 
The  degrees  of  freedom  present  for.  each  t  statistic  were 
n  -  p  *  6.  At  the  20th  step,  with  6  degrees  of  freedom, 
an  alpha  level  of  .05,  and  employing  a  two-tailed  test, 
tcrt  “  £2.447.  Table  4  is  a  tabulation  of  the  individual 
t  tests. and  their  comparison  to  tcrt» 

From  Table  4,  it  can  be  seen  that  only  one  variable, 
Variable  32,  did  not  reject  the  null  hypothesis.  Therefore, 
it  could  be  said  that  there  wore  19  significant  variables  in 
the  regression  equation. 

The  same  F  and  t  tests  were  applied  to  the  equations 
developed  for  Technique  2  with  similar  results.  The  F  tests 
failed  to  reject  the  stated  null  hypothesis  at  any  time  and 
the  t  tests  were  equally  unsuccessful  in  limiting  the  number 
of  significant  variables.  The  large  number  of  variables 
included  in  the  regression  equations  and  the  results  of  the 
F  and  t  tests  raised  serious  doubts  about  the  validity  of 
the  data  arrangement  being  employed. 

Verification  of  the 
Statistical  Tool 

The  use  of  multiple  regression  developed  models  in 

lh:>L  several  as;>u  r;  lie 


making  statistical  inferences  implies 
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Table  4 


"t"  TEST  FOR'  SIGNIFICANCE  ON 
SAMPLE  REGRESSION  EQUATION 


VARIABLE 

NET  REGRESSION 
COEFFICIENT 
(a) 

STANDARD  ERROR 

OF  NET  REGRESSION 
COEFFICIENT 
(b) 

•’t"  VALUE 
(a/b) 

2 

-0.03597 

0.00613 

-  5.867 

3 

0.00386 

0.00052 

7.423 

4 

0.07109 

0.00567 

12.337 

7 

-0.01457 

0.00053 

-27.490 

9 

0.01387 

0.00053 

26.169 

10 

-0.01286 

0.00515 

-  2.497 

12 

-0.02684 

0.00492 

-  5.455 

15 

0.00493 

0.00033 

14.939 

20 

0.04749 

0.00372 

12.766 

■  23 

-0.00600 

0.00042 

-14.285 

25 

0.00150 

0.00030 

5.000 

28 

0.01047 

0.00398 

2.630 

32 

0.01509 

0.00676 

2.232* 

34 

-0.10957 

0.00488 

-22.452 

37 

-0.00451 

0.00051 

-  8.843 

38 

0.09315 

0.00545 

17.091 

45 

0.00086 

0.00035 

2.457 

54 

-0.04043 

0.00442 

-  9.147 

56 

0.08306 

0.00560 

14.832 

62 

-0.02947 

0.00450 

-  6.548 

mt"  values  less  than  t  t  and  not  rejected 


n  =  27  observations 

p  *  21  variables  (the  20  included  in  equation  at  20th  step 
plus  one  for  the  constant  term) 

Degrees  of  freedom  =  n  -  p  =  6 

Hq  i  Bi  «  0 

i  ^  0,  k</2  =  .025  (two-tailed  test) 
tcrt  =  ±2*447»  with  DF  =  6  and  C*/2  =  .025 


have  been  made.  These  assumptions  are  all  related  to  the 
residuals  or  estimates  of  the  error  term  €  ^  contained  in 
each  developed  model.  They  aret 

1.  The  residuals  are  clustered  around  a  rectilinear 
plane,  commonly  known  as  the  assumption  of  linearity. 

2.  The  residuals  are  uniform  in  their  scatter  or 
homoscedasticity  is  present. 

3.  The  residuals  are  statistically  independent  of 
each  other  or  there  is  no  serial  or  autocorrelation. 

4.  The  residuals  are  normally  distributed. 

It  was  recognized  that  these  assumptions  did  exist 
and  each  was  graphically  or  statistically  tested  to  establish 
its  validity  within  the  developed  models.  If  these  four 
assumptions  are  satisfied,  it  is  then  possible  to  measure 
the  sampling  error,  the  error  associated  with  any  given  point 
on  the  regression  plane,  of  the  net  regression  coefficients. 

These  measures  could  then  be  used  lh>  make  valid  statistical 

\ 

inferences  about  the  true  regression  relationships. 

Before  applying  the  developed  models,  an  additional 
check  was  made  for  collinearity  or  simple  correlation 
between  the  independent  variables.  When  the  independent 
variables  in  a  multiple  regression  are  highly  correlated  with 
each  other,  the  net  regression  coefficients  may  be  unreliable. 
(2i610)  As  stated  above,  these  assumptions  are  related  to  and 
tested  by  the  residuals.  The  residuals  are  an  estimate  of  • 
the  error  term  commonly  expressed  as 


where  Y  is  a  specific  observed  value  of  the  dependent  vari¬ 
able,  Y  the  estimate  of  Y  calculated  by  the  least  squares 
regression  equation  Y  =  a  +  bX  and  e  the  residual  or  devia- 
tion  of  Y  from  Yc. 

Test  of  linearity  and  homoscedasticitv.  A  visual 
assessment  of  the  plots  of  residuals  against  each  of  the 
independent  variables  included  in  the  regression  equation 
is  considered  to  be  an  adequate  and  useful  check  on  the 
validity  of  the  assumptions  of  linearity  and  homoscedasti- 
city.  (2 i 608)  An  examination  of  Figure  11  indicates  that 
the  scatter  of  Variable  3  plotted  against  the  residuals  is 
approximately  uniform  and  that  there  is  no  evidence  of 
curvil inear ity.  Plots  of  similar  conditions  were  found  to 
exist  throughout  all  of  the  variables  in  the  Technique  1 
equation.  Reference  Appendix  A.  Thus,  it  was  concluded 
thac  the  assumptions  of  linearity  and  homoscedasticity  were 
valid  for  the  Technique  1  model. 

Test  of  statistical  Independence.  When  dealing  with 
time  series  data,  there  is  a  distinct  possibility  that  the 
residuals  may  not  be  independent.  If  they  are  not  and 
serial  correlation  can  be  shown  to  exist,  then  the  least 
squares  regression  analysis  may  not  give  the  best  estimates. 
The  estimates  will  not  contain  minimum  variance,  Yamane 

I 

recommends  the  use  of  the  Durbin-Watson  test  to  test  whether 
or  not  the  residuals  are  statistically  independent.  A  d 
statistic  is  figured  in  terms  of  deviates  and  first  differ¬ 
ences  and  then  compared  ap^inrl:  critical  values  prepared  toy 
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PLOT  Of  RES  I DUALS  (Y-AXIS) 

VS*  VARIABLE  3  (X-AXIS) 

1461.066  1944.469  2407,939  3131.416  3674.676  4116.347 

672.736  2216.204  2769.673  3363.143  3046.612 


>6.63 


•6.43 


•6.34 


>.6.24 


•0.16 


•6.06 


0.04 


6.14 


6,23 


6,33 


1401.000  1944.469  2407.939  3031.400  3674.076  4110.347. 

672.736  2216.204  2799.673  3303.143  3646.612 


Figure  1 1 

Plot  of  Residuals  Ap,ainst  Variable  3 
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Durbin  and  Watson.  (24*809-813)  Both  positive  and  negative 
serial  correlation  is  tested  by  this  method.  However,  it 
was  discovered  that  the  critical  value  table  for  d  allows 
for  only  5  independent  variables.  A  research  of  Durbin  and 
Watson's  original  work  in  this  area  indicated  that  this 
particular  test  lost  its  significance  when  large  numbers  of 
independent  variables  were  present.  (11*409-428,  12*159- 
178)  Deprived  of  this  proven  test  procedure  and  unable  to 
find  a  suitable  replacement,  the  assumption  of  statistical 
independence  among  the  residuals  had  to  be  left  untested. 

Test  of  normality.  The  Fisher  g  test  was  employed 
to  test  the  normality  of  the  residuals  obtained  at  the  20th 
step  of  each  regression  equation.  (12*52)  The  following 
example  illustrates  the  test  performed  on  the  regression 
equation  for  Technique  1.  The  hypothesis  was* 

Hq  »  e/\N  =  0,  0s-2) 

Hi  «  e/?C  N  (jj  =  0,0s-2) 

A 

Two  statistics  gj  and  g2,  their  variances  V(gj)  and  V(g2) 
and  their  standardized  variates  Z(gj)  and  Z(g2)  were  calcu¬ 
lated  with  the  program  found  in  Appendix  B.  Those  values 

g1  =  -0.4437  g2  =  -0.7729 

V(gj)  =  0.2006  V(g2)-=  0.7605 

Z(gt)  =  -0.9908  Z(g2)  =  -0,8862 


were* 
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The  algebraic  statement  in  each  case  was, 

:  S’  [*x  *  -0.4«7  |  Bo]  -  P  [zcrt<  Z(Sl)] 

P  [e2  <  -0.7729  |  H0]  *  P  [zcrC  <  Z(g2)] 

At  an  alpha  level  of  .05  and  conducting  a  two-tailed  test 
Zcrt  *  ±1*96.  The  standardized  variate  in  both  cases  did 
not  exceed  the  2crt  value,  therefore,  the  null  hypothesis 
could  not  be  rejected.  However,  it  can  be  said  that  the 
distribution  is  generally  platykurtic  (somewhat  flat)  and 
skewed  to  the  left  because  of  the  negative  sign  on  gj  and 
g2»  But  because  the  null  hypothesis  was  not  rejected,  the 
assumption  of  normality  was  considered  satisfied. 

Check  for  high  multicollinearity.  Multicollinearity 
refers  to  the  presence  of  correlation  between  the  independent 
variables  of  a  regression  model.  As  was  noted  earlier,  it  is 
considered  good  practice  to  check  for  its  presence  before 
accepting  a  regression  model  as  reliable  for  use.  In  general, 
the  existence  of  multicollinearity  results  in  the  inaccurate 
estimation  of  the  regression  coefficients  because  of  the 
large  sample  variances  of  the  coefficient  estimators.  (15# 
149)  Thus,  the  net  regression  coefficients  become  unreliable. 
It  should  be  noted,  however,  that  while  collinearity  affects 
the  reliability  of  individual  coefficients  in  the  regression, 
it  may  not  alter  the  predictive  power  of  the  total  regression 
equation.  (2s 610) 

The  method  employed  for  checking  the  seriousness  of 
multicollinearity  called  for  a  comparison  of  the  simple 
correlation  (r^j)  between  pairs  of  independent  variables  and 
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the  coefficient  of  multiple  correlation  (R).  The  simple 
correlation  for  each  and  every  independent  variable  con¬ 
sidered  was  available  in  the  correlation  matrix  of  the  BMD 
output.  The  "rule  of  thumb*'  suggested  by  Klein  is  that  if 
r^j  >  R,  then  the  multicollinearity  which  exists  is  critical 
and  adversely  affects  the  model.  (15:154)  The  validity  of 
the  check  in  this  instance  was  questioned,  however,  when  an 
r^j  of  .99  was  compared  to  an  R  of  .9995  and  high  collinearity 
was  deemed  not  to  be  critical.  A  rational  approach  would 
assume  that  when  near  perfect  correlation  exists  between 
several  of  the  independent  variables  multicollinearity  does 
exist  to  a  "critical"  degree.  Therefore,  this  check  was 
cons idered  inconclus ive . 

Verification  summary.  In  brief  summary,  the  four 
assumptions  associated  with  multiple  regression  were  tested 
and  the  following  conclusions  reached* 

1.  The  assumptions  of  linearity  and  homoscedasticity 
were  determined  to  be  valid. 

I .  The  assumption  of  statistical  independence  among 
the  residuals  had  to  be  left  untested. 

3.  The  assumption  of  normality  among  the  residuals 
was  considered  valid. 

In  addition,  a  check  for  the  critical  level  of  high 
multicollinearity  was  considered  inconclusive. 

Validation  of  the  Model 

At  this  point,  a  decision  ) ' ~ d  to  be  made  whether  to 


change  the  basic  approach  and  develop  a  new  model  or  to  test 
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the  forecasting  ability  of  the  existing  model.  Although 
there  were  strong  indications  that  the  model  was  deficient, 
it  was. decided  to  proceed 'with  the  forecasting  and  evaluate 
the  results. 

As  described  in  Chapter  2,  two  techniques  were 
utilized  in  obtaining  forecasts.  Technique  1  made  use  of  a 
single  model  to  predict  for ' eleven  months .  Technique  2 
varied  the  model,  by  moving  the  data  base,  to  utilize  the 
most  current  data  in  making  eleven  monthly  predictions.  In 
this  manner,  two' sets  of  engine  failure  forecasts  were 
compiled.  These  forecasts  are  tabulated  in  Table  5.  It  was 
becoming  obvious  that  the  selected  approach  was  not  working 
and  only  a  feel  for  accuracy  would  be  required.  The  remain¬ 
ing  emphasis  was  to  be  placed  on  the  reasons  for  failure  to 
accurately  forecast.  This  "feel  for  accuracy"  was  obtained 
by  comparing  the  absolute  differences  between  the  model's 
forecasts  and  actual  failures  to  the  absolute  difference 
between  .the  mean  number  of  failures  and  actual  failures. 

The  results  were  tabulated  in  Table  6.  Using  the  mean  number 
of  failures  for  the  period  July  1968  through  September  1970 
would  have  provided  a  better  forecasting  tool  than  the  model 
developed  here. 


►j3 

c  OS 

10  SO 

H 

o 

00 

fH 

CO 

ON 

CO 

*$• 

in 

H  *0 

H 

CM 

H 

cm 

CO 

CM 

?H 

CO 

CM 

CM 

OH 

<< 

CM 

CM 

w  £ 

r?  W 
0*0 

S  w 

g§ 

m 

ON 

NO 

NO 

CO 

CO 

VO 

in 

fH 

rH 

• 

o\ 

r» 

. 

<M 

fH 

• 

H 

fH 

• 

rH 

in 

• 

CO 

fH 

• 

Is* 

. 

NO 

CM 

. 

CO 

CO 

• 

CO 

ON 

• 

in 

gS 

H 

cm 

co 

<r 

rH 

CO 

fH 

1 

r-< 

w  H 

SO  J2 

in 

fH 

ON 

NO 

ON 

CO 

o 

in 

ON 

fH 

srs 

fH 

co 

r>. 

CO 

in 

r» 

CM 

VO 

CM 

CM 

• 

• 

• 

• 

• 

• 

• 

• 

• 

• 

g  S 

on 

•tf 

CO 

CO 

in 

ON 

CO 

o 

NO 

$ 

Sg 

gS 

cm 

cm 

cn 

CO 

rH 

CM 

CO 

CO 

CO 

S3 

H 

o 

w 

W  - 

3 

Q  Cm 

o 

1° 

in 

Cm 

k 

k 

SO 

*  33 

0 

fl> 

So 

k 

o> 

Cd 

H  H 

JO 

JO 

k 

ns 

k 

H 

os 

W  Z 

E 

£ 

CO 

? 

X3 

H’ 

CO 

,o 

D 

<  O 

a) 

0 

p 

k 

CJ 

♦r< 

0 

P 

ns 

XI 

OS 

> 

o 

c 

XI 

k 

k 

So 

d 

H 

CO 

H 

H 

w 

0 

0 

(0 

0 

ns 

a 

0 

p 

P 

p 

< 

03  « 

z 

Q 

•“) 

Cm 

2 

< 

2 

*0 

H) 

< 

Cm 

O  O 

Cm  Cm 

W 

z 

w 

gS 

3g 

> 

O 

d 

Xs 

k 

k 

Co 

c 

H 

CO 

S3  h 

o 

Q} 

ns 

0 

ns 

a 

0 

p 

P 

p 

05 

Pi 

z 

1 

Q 

i 

►» 

i 

CM 

• 

?. 

i 

< 

i 

2 

i 

*■> 

*o 

i 

< 

1 

f„  CO  ^ 

4J 

> 

o 

c 

x> 

k 

k 

So 

e 

H 

g  w 

a 

0 

o 

ns 

0 

ns 

a 

0 

p 

P 

2-  Ki 

h  n 

o 

'Z 

o 

*-> 

Cm 

2 

< 

2 

•n 

•O 

go 

H  W 

o 

O 

o 

VH 

rH 

fH 

fH 

fH 

fH 

fH 

W 

2 

t-' 

l-> 

r*. 

on 

on 

ON 

ON 

ON 

ON 

ON 

ON 

ON 

ON 

H 

fH 

H 

fH 

fH 

r*H 

r-l 

fH 

fH 

»— i 

fH 

H 

k 

k 

So 

z 

k 

a> 

as 

So 

k 

H 

a) 

JO 

JO 

k 

ns 

x> 

E 

E 

ns 

p 

x: 

H 

H 

o 

0 

0 

P 

k 

0 

•rl 

0 

Co 

Z 

jj 

> 

O 

C 

XI 

k 

U 

So 

d 

H 

H 

o 

O 

0 

ns 

0 

0 

a 

0 

p 

P 

o 

CM 

o 

Z 

Q 

*-> 

Cm 

2 

•0 

s 

•-» 

►o 

»H 

fH 

fH 

fH 

H 

fH 

fH 

fH 

fH 

fH 

August  1971  Aug  -  Sep  September  30.80  31.67 


*Lowest  Absolute  Difference  (per  month) 


Chapter  4 


CONCLUSIONS  AND  RECOMMENDATIONS 

This  research  effort  endeavored  to  show  that  TF33-3 
engine  failures  are  dependent  upon  some  combination  of 
historical  flying  hour  and  sortie  data.  A  method  of  arrange¬ 
ment  was  undertaken  which  transformed  the  data  collected  on 
the  B-52H  aircraft  fleet  into  specific  historical  groupings. 
These  specific  groups  of  data  were  then  statistically  analyzed 
and  a  forecasting  model  developed.  The  statistical  analysis 
was  performed  by  the  application  of  multiple  correlation  and 
regression  techniques  to  the  data.  Then  monthly  forecasts 
were  made  for  a  period  of  eleven  months.  This  chapter  inter¬ 
prets  the  model  behavior,  draws  conclusions,  and  makes 
recommendations  for  further  study. 

Interpretation  of  the 

ttadgJt.-fig.toto 

The  following  relationships  were  found  to  exist  in 
the  basic  datai 

1 .  It  was  determined  by  the  least  squares  method  that 
engine  failures  and  flying  hours  were  linear  in  their  relation- 
ship.  A  coefficient  of  determination  (R  )  of  0.3137  'indicated 
that  the  curve  fit  was  not  relatively  powerful. 

2.  A  least  squares  linear  relationship  was  .also 
found  to  exist  between  ermine  failures  and  sorties.  .  Again, 
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2 

a  low  R  of  0.3099  indicated  a  relatively  weak  curve  fit. 

3.  The  best  curve  fit  of  sorties  versus  flying 
hours  was  determined  to  be  a  curvilinear  function  with  an 
R2  of  0.7830. 

The  population  net  regression  coefficients  for  each 
model  were  tested  for  significance  in  two  separate  manners 
because  of  the  excessive  number,  of  variables  present  in  each 
model  equation.  First,  the  overall  significance  of  each 
model’s  regression  plane  was  tested  with  an  F  test  and  in 
each  of  the  tests  it  was  determined  that  regression  was  pre¬ 
sent  in  the  population  and  the  improvement  obtained  by  fitting 
these  regression  planes  was  not  due  to  chance.  Secondly, 
a  t  test  was  applied  to  each  of  the  population  net  regression 
coefficients  to  test  their  individual  significance.  These 
tests  were  unsuccessful  at  the  .05  significance  level  in 
limiting  any  model  equation  to  less  than  eighteen  signifi¬ 
cant  variables. 

Of  the  four  basic  assumptions  which  should  be  verified 
before  a  multiple  regression  developed  model  can  be  usefully 
employed,  only  three  were  confirmed.  The  assumptions  of 
linearity,  homoscedasticity  and  normality  of  the  residuals 
were  determined  to  be  valid.  An  applicable  test  could  not 
be  found  to  exist  for  the  test  of  statistical  independence 
among  the  residuals.  Therefore,  the  assumption  had  to  be 
left  untested.  The  tests  for  "critical'’  Tevels  of  multicol- 
linearity  were  considered  inconclusive.  However,  when  near 
perfect  correlation  exists  ns  it  does  between  several  of  the 
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Independent  variables.,  multicollinearity  very  likely  does 
exist  to  a  critical  degree. 

Attempts  to  forecast  engine  failures  with  the 
developed  regression  models  resulted  in  predictions  with 
wide  variances  from  the  failures  which  actually  occurred. 
The  absolute  difference  between  the  forecasted  and  actual 
engine  failures  was  in  several  cases  twice  the  actual 
failure  figure.  The  mean  number  of  engine  failures  for  the 
period  July  1968  through  September  1970  was  a  better  pre¬ 
dictor  of  engine  failures  than  were  the  regression  models. 
In  fact,  the  mean  failure  was  closer  to  the  actual  failure 
for  all  but  three  forecasts  (See  Table  6). 

Conclusions 

The  research  hypothesis --a  combination  of  flying 
hours  and  sorties  can  be  utilized  to  yield  accurate  jet 
engine  failure  forecasts --under  test  in  this  thesis  could 
not  be  accepted  as  a  result  of  the  poor  forecasting  ability 
of  the  developed  regression  models.  Even  though  the  re¬ 
search  hypothesis  could  not  be  accepted  as  a  result  of  the 
findings,  the  use  of  sorties  and  flying  hours  should  not  be 
discounted  as  determinants  of  jet  engine  failures.  One  of 
the  initial  premises  was  that  the  inclusion  of  sorties  in 
developing  a  forecast  model  would  improve  the  forecasting 
ability  of  that  model.  This  premise  should  be  evaluated 
further.  Of  the  twenty  variables  included  in  the  regres¬ 
sion  model,  twelve  were  based  upon  historical  sortie  data. 
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Additionally,  in  each  case,  the  independent  variable  which 
explained  the  greatest  portion  of  variation  in  the  depen¬ 
dent  variable  was  the  current  month's  observed  sorties. 
These  facts  would  seem  to  indicate  that  sorties  do  have  a 
very  significant  impact  upon  engine  failure  determination 
and  should  be  further  investigated. 

The  critically  high  multicollinearity  present  in 
the  model  was  apparently  the  dominant  factor  which  caused 
the  regression  model's  failure  to  accurately  predict.  The 
authors  believe  that  this  phenomena  was  introduced  as  a 
result  of  the  method  used  to  cumulatively  arrange  the  data 
input.  The  multicollinearity  present  in  these  64  "created'’ 
variables  tended  to  cause  the  individual  net  regression 
coefficients  to  become  unreliable  to  a  degree  high  enough 
to  affect  the  forecasting  ability  of  the  model. 

Recommendations 

As  a  result  of  the  above,  findings  and  conclusions, 
the  following  recommendations  for  further  studies  are 
made  i 

1.  The  hypothesis  used  in  this  thesis  should 
be  tested  further  by  using  different  data  arrangement  tech¬ 
niques.  Specifically,  the  data  should  be  arranged  so  that 
minimum  multicollinearity  is  introduced  into  the  regression 
model.  One  practical  solution  would  be  to  use  the  indi¬ 
vidual  month's  observations  as  the  independent  variables. 
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2.  A  thorough  study  of  the  basic  flying  hour, 
sortie  and  engine  failure  data  should  be  undertaken  to  d< 
fine  its  behavioral  patterns. 
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PLOT  or  RESIDUALS  (Y-AXIS) 
VS,  VARIABLE  7  (X-AXIS) 
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PLOT  Of  RESIDUALS  (Y-AXIS) 
VS.  VAK) ABLE  *0  ( A-AXl S> 
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PLOT  OF  RESIDUALS  (Y-AXIS) 

YS.  VARIABLE  25  (X-aXIS) 
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PLOT  Of  RES  l  DUALS  <Y-AX1S> 
VS.  VARIABLE  37  <X-AXIS> 
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V717.000  ICIVo,.  1  10679. CHI  11160.361  116H1.C02  12122.132 
V*;7,>10  10<3«.t>3l  10919.  S>J>1  1H0Q.571  11601,692 


PLOT  or  RESIDUALS  (Y-AXIS) 
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PLOT  OF  RESIDUALS  (Y-AXIS) 

VS*  VARIABLE  62  (X-AXIS) 
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